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© A novei-betal- 6 N-acetytglucosaminyltransf erase, its acceptor molecule, leukoslalln, and a method 
for cloning proteins having enzymatic activity. 

© The present invention provides a novel £1-6 A^acetylglucosaminyltransferase, which forms core 2 oligosac- 
charide structures in O-glycans. and a novel acceptor molecule, leukosialin. CD43, for core 2 £1-6 N- 
acetylglucosaminyltransferase activity. The amino acid sequences and nucleic acid sequences encoding these 
molecules, as well as active fragments thereof, also are disclosed. A method for isolating nucleic acid sequences 
encoding proteins having enzymatic activity is disclosed, using CHO cells that support replication of plasmid 
vectors having a polyoma virus origin of replication. A method to obtain a suitable cell line that expresses an 
acceptor molecule also is disclosed. 
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This work was supported by grants CA33000 and CA33895 awarded by the National Cancer Institute. 
The United States Government has certain rights in this invention. 

BACKGROUND OF THE INVENTION 

5 

FIELD OF THE INVENTION 

This invention relates generally to the fields of biochemistry and molecular biology and more specifi- 
cally to a novel human enzyme, UDP-GlcNAc:Gal£1-3GalNAc (GlcNAc to Gal N Ac) 01-6 A/-acetyl- 

70 glucosaminyltransferase (core 2 £1-6 A/-acetyIg!ucosaminyltransferase; C2GnT). and to a novel acceptor 
molecule, leukosialin, CD43, for core 2 £1-6 A/-acetyIglucosaminyltransferase action. The invention addi- 
tionally relates to DNA sequences encoding core 2 £1-6 /V-acetylglucosaminyltransferase and leukosialin, 
to vectors containing a C2GnT DNA sequence or a leukosialin DNA sequence, to recombinant host cells 
transformed with such vectors and to a method of transient expression cloning in CHO cells for identifying 

75 and isolating DNA sequences encoding specific proteins, using CHO cells expressing a suitable acceptor 
molecule. 

BACKGROUND INFORMATION 

20 Most O-glycosidic oligosaccharides in mammalian glycoproteins are linked via /^-acetylgalactosamine to 
the hydroxyl groups of serine or threonine. These O-glycans can be classified into 4 different groups 
depending on the nature of the core portion of the oligosaccharides (see Fig. 1). Although less well studied 
than W-glycans. O-glycans likely have important biological functions. Indeed, the presence of O-hnked 
oligosaccharides with the core 2 branch, Gal£1-3(GlcNAc£1-6)GalNAc. has been demonstrated in many 

25 biological processes. .... . A . 

Piller et al.. J. Biol. Chem 263:15146-15150 (1988) reported that human. T-cell activation is associated 
with the conversion of core 1 -based tetrasaccharides to core 2-based hexasaccharides on leukosialin, a 
major sialoglycoprotein present on human T lymphocytes (see also Fig. 1). A similar increase in hexasac- 
charides was observed in peripheral blood lymphocytes of patients suffering from T-ce!l leukemias (Saitoh 

so et al , Blood 77:1491-1499 (1991)), myelogenous leukemias (Brockhausen et al., Cancer Res. 51:1257-1263 
(1 991 ))~£nd~ immunodeficiency due to AIDS and the Wiskott-Aldrich syndrome (Piller et al. v J. Exp. Med. 
173-1501-1510 (1991)). In these patients* lymphocytes, changes in the amount of hexasaccharides were 
caused by increased activity of either UDP-GlcNAc:Gal£1-3GalNAc (GlcNAc to Gal N Ac) 6-£-D-Af-acetyl- 
glucosaminyltransferase (EC2.4.1.102) or core 2 £1-6 A^acetylglucosaminyltransferase (Williams et al., J. 

35 Biol Chem. 255:11253-11261 (1980)). Increased activity of core 2 £1-6 A/-acetylglucosaminyltransferase 
also was observed in metastatic murine tumor cell lines as compared to their parental, non-metastatic 
counterparts (Yousefi et al., J. Biol. Chem. 266:1772-1782 (1991)). 

Increased complexity of the attached oligosaccharides increases the molecular weight of the 
glycoprotein. For example, leukosialin containing hexasaccharides has a molecular weight of ~l35kDa, 

40 whereas leukosialin containing tetrasaccharides has a molecular weight of ~105kDa (Carisson et al.. J. Biol. 
Chem. 261 :1 2779-1 2786 and 1 2787-1 2795 (1 986)). 

Fox et al.. J. Immunol. 131:762-767 (1983) raised a monoclonal antibody, T305, against human T- 
lymphocytic leukemia cells. Sportsman et al., J. Immunol. 135:158-164 (1985) reported T305 binding was 
abolished by neuraminidase treatment, suggesting T305 binds to hexasaccharides. T305 specifically reacts 

45 with the high molecular weight form of leukosialin (Saitoh et al., supra , (1991)). 

Previous studies indicated poly-A^acetyllactosamine repeats extend almost exclusively from the branch 
formed by the core 2 £1-6 /V-acetylglucosaminyltransferase (Fukuda et al., J. Biol. Chem. 261:12796- 
12806 (1986)). Consistent with these results, Yousefi et al., supra , (1991) demonstrated that the core 2 
enzyme in metastatic tumor cells regulates the level of poly-A/-acetyllactosamine synthesis in O-linked 

so oligosaccharides. 

Poly-A^acetyllactosamines are subject to a variety of modifications, including the formation of the sialyl 
Le x , NeuNAca2-3Gal£1-4(Fuc«1-3)GlcNAc-, or the sialyl Le a , NeuNAc«2-3Gal£1-3 (Fuc«1-4)GlcNAc- 
, determinants (Fukuda, Biochim. Bionhys. Acta 780:119-150 (1985)). Such modifications are sign.f.cant 
because these determinants, which are present on neutrophils and monocytes, serveas Jigands ^for E- and 
55 P-selectin present on endothelial cells and platelets, respectively (see, for example, Larsen et al., Cell 
63:467-474 (1990)). . 

In addition, tumor cells often express a significant amount of sialyl Le x and/or sialyl Le on their ceil 
surfaces. The interaction between E-selectin or P-selectin and these cell surface carbohydrates may play a 
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role in tumor cell adhesion to endothelium during the metastatic process (Walz et al., supra , (1990)). Kojirna 
et a | Biochem. Biophvs. Res. Commun. 182:1288-1295 (1992) reported that selectin-dependent tumor cell 
adhesion to endothelial cells was abolished by blocking O-glycan synthesis. Complex sulfated O-glycans 
also may serve as ligands for the lymphocyte homing receptor, L-selectin (Imai et al., J. Cell Biol. 1 13:1 213- 

5 1221 (1991)). , . 4 . . 

These reported observations establish core 2 01 -6 A/-acetylglucosamtnyltransferase as a critical 
enzyme in O-glycan biosynthesis. The availability of core 2 01-6 AZ-acetylglucosaminyltransferase will 
allow the in vivo and in vitro production of specific glycoproteins having core 2 oligosaccharides and 
subsequenF study of these variant O-glycans on cell-cell interactions. For example, core 2 01-6 A/- 

10 acetylglucosaminy (transferase is a useful marker for transformed or cancerous cells. An understandmg of 
the role of core 2 01—6 A/-acetylglucosaminyItransferase in transformed and cancerous cells may elucidate 
a mechanism for the aberrant cell-cell interactions observed in these cells. In order to understand the 
control of expression of these oligosaccharides and their function, isolation of a cDNA clone for core 2 
01-6 /V-acetylglucosaminyltransferase is a prerequisite. However, the DNA sequence encoding core 2 

is 01—6 A^acetylglucosaminyltransferase has not yet been reported. 

Thus a need exists for identifying the core 2 01-6 A/-acetylglucosaminyltransf erase and the DNA 
sequences encoding this enzyme. The present invention satisfies this need and provides related advan- 
tages as well. 

20 SUMMARY OF THE INVENTION 

The present invention generally relates to a novel purified human 01-6 A/-acetyIglucosaminyltrans- 
ferase A cDNA sequence encoding a 428 amino acid protein having 01-6 N-acetylglucosammyltransferase 
activity also is provided. The purified human 01-6 AA-acetylglucosaminyltransf erase, or an active fragment 
25 thereof, catalyzes the formation of critical branches in O-glycans. 

The invention further relates to a novel purified acceptor molecule, leukosialin. CD43, for core 2 01—6 
A/-acety!gIucosaminyltransferase activity. The leukosialin cDNA encodes a novel variant leukosialin. which 
is created by alternative splicing of the genomic leukosialin DNA sequence. 

Isolated nucleic acids encoding either core 2 01-6 W-acetylg!ucosaminyltransferase or leukosialin are 
30 disclosed as are vectors containing the nucleic acids and recombinant host cells transformed with such 
vectors The invention further provides methods of detecting such nucleic acids by contacting a sample with 
a nucleic acid probe having a nucleotide sequence capable of hybridizing with the isolated nucleic acids of 
the present invention. The core 2 01-6 N-acetylglucosaminyltransferase and leukosialin amino acid and 
nucleic acid sequences disclosed herein can be purified from human cells or produced using well known 
as methods of recombinant DNA technology. 

The invention also discloses a method of isolating nucleic acid sequences encoding proteins that have 
an enzymatic activity. Such a nucleic acid sequence is obtained by transfecting the nucleic acid, which is 
contained within a vector having a polyoma virus replication origin, into a Chinese hamster ovary (CHO) cell 
line simultaneously expressing polyoma virus large T antigen and the acceptor molecule for the protein 
40 having an enzymatic activity. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 depicts the structures and biosynthesis of O-glycans. Structures of O-glycan cores can be 
45 classified into 4 groups (core 1 to core 4). each of which is synthesized starting with GalNAcorl-Sermir. 
The core 1 structure is synthesized by the addition of a 01-3 Gal residue to the Gal N Ac residue. The core 
1 structure can be converted to core 2 by the addition of a 01-6 W-acetylglucosaminyl residue. This 
intermediate is usually converted to the hexasaccharide by sequential addition of galactose and sialic acid 
residues (bottom right). The core 2 01-6 A/-acetylgIucosaminyltransferase and the linkage formed by the 
so enzyme are indicated by a box. In certain cell types, the core 2 structure can be extended by the addition 
of A/-acety!lactosamine (Gal01-4GlcNAc01-3) repeats to form poly-W-acetyllactosamine. In the absence of 
core 2 01—6 Af-acetylglucosaminy transferase, core 1 is converted to the monosialoform. then to the 
disialoform by sequential addition of a2— 3- and a2— 6-linked sialic acid residues (bottom left). Alternatively, 
core 3 can be synthesized by the addition of a 01-3 N-acetylglucosaminyl residue to the GalNAc residue. 
55 Core 3 can be converted to core 4 by another 01—6 W-acetyiglucosaminyltransf erase (top of figure). 

Figure 2 depicts genomic DNA sequence (SEQ. ID. NO. 1) and cDNA sequence (SEQ. ID. NO. 1) of 
leukosialin. The genomic sequence is numbered relative to the transcriptional start site. Bcon 1 and exon 2 
have been previously described. Exon V is newly identified here. In the isolated cDNA, exon V is 
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immediately followed by the exon 2 sequence. Deduced amino acids (SEQ. ID. NO. 2) are presented under 
the coding sequence, which begins in exon 2. A portion of the exon 2 sequence is shown. 

Figure 3 establishes the ability of pGT/hCG to replicate in CHO cell lines expressing polyoma large T 
antigen and leukosialin. In panel A, six clonal CHO cell lines were examined for replication of pcDNAI-based 
s pGT/hCG (lanes 1-6). In panel B, replication of cell clone 5 (CHO-Py-leu), was further examined by 
treatment with increasing concentrations of Dpnl and Xhol (lanes 2 and 3). Plasmid DNA isolated from 
MOP-8 cells was used as a control (lane 1). Plasmid DNA was extracted using the Hirt procedure and 
samples were digested with Xhol and Dpnl. In parallel, pGT/hCG plasmid purified from E. coli MC1061/P3 
was digested with Xhol and Dpnl (lane 7 in panel A and lane 4 in panel B) or Xhol alone (lane 8 in panel A 
w and lane 5 in panel B). The arrow indicates the migration of plasmid DNA resistant to Dpnl digestion. The 
arrowheads indicate plasmid DNA digested by Dpnl. 

Figure 4 shows the expression of T305 antigen expressed by pcDNAI-C2GnT. Subconfluent CHO-Py- 
leu cells were transfected with pcDNAI-C2GnT (panels A and B) or mock-transfected with pcDNAI (panels C 
and D). Sixty four hours after transfection, the cells were fixed, then incubated with mouse T305 monoclonal 
75 antibody followed by fluorescein isocyanate-conjugated sheep anti-mouse IgG (panels A, B and C). Two 
different areas are shown in panels A and B. Panel D shows a phase micrograph of the same field shown in 
panel C. Bar = 20um. 

Figure 5 depicts the cDNA sequence (SEQ. ID. NO. 3) and translated amino acid sequences (SEQ. ID. 

NO. 4) of core 2 01 -*6 /V-acetylglucosaminy (transferase The open reading frame and full-length nucleotide 
20 sequence of C2GnT are shown. The signal/membrane-anchoring domain is doubly underlined. The 

polyadenylation signal is boxed. Potential A/-glycosylation sites are marked with asterisks. The sequences 

are numbered relative to the translation start site. 

Figure 6 shows the expression of core 2 6 A/-acetylglucosaminyltransferase mRNA in various cell 

types. Poly(A) + RNA (11 ng) from CHO-Py-leu cells (lane 1), HL-60 promyelocytes (lane 2), K562 
25 erythrocytic cells (lane 3). and SP and L4 colonic carcinoma cells (lanes 4 and 5) was resolved by 

electrophoresis. RNA was transferred to a nylon membrane and hybridized with a radiolabeled fragment of 

pPROTA-C2GnT. Migration of RNA size markers is indicated. 

Figure 7 illustrates the construction of the vector encoding the protein A-C2GnT fusion protein.. The 

cDNA sequence corresponding to Pro 38 to His 428 was fused in frame with the IgG binding domain of S. 
30 aureus protein A (bottom; SEQ. ID. NOS. 7 and 8). The sequence includes the cleavable signal peptide, 

which allows secretion of the fused protein. The coding sequence is under control of the SV40 promoter. 

The remainder of the vector sequence shown was derived from rabbit 0-globin gene sequences, including 

an intervening sequence (IVS) and a polyadenylation signal (An). 

35 DETAILED DESCRIPTION OF THE INVENTION 

The present invention generally relates to a novel human core 2 01—6 W-acetylglucosaminyltransferase. 
The invention further relates to a novel method of transient expression cloning in CHO cells that was used 
to isolate the cDNA sequence encoding human core 2 /31-*6 A/-acetylglucosaminyltransferase (C2GnT). The 

40 invention also relates to a novel human leukosialin, which is an acceptor molecule for core 2 01— 6 N- 
acetylglucosaminyltransferase activity. 

Cells generally contain extremely low amounts of glycosyltransferases. As a result, cDNA cloning based 
on screening using an antibody or a probe based on the glycosyltransferase amino acid sequence has met 
with limited success. However, isolation of cDNAs encoding various glycosyltransferases can be achieved 

45 by transient expression of cDNA in recipient cells. v ■ --.■----■■./ - ... 

Successful application of the transient expression cloning method to isolate a cDNA sequence encoding 
a glycosyltransferase requires an appropriate recipient cell line. Ideal recipient cells should not express the 
glycosyltransferase of interest. As a result, the recipient cells would normally lack the oligosaccharide 
structure formed by such a glycosyltransferase. 

so Expression of the cloned glycosyltransferase cDNA in the recipient cell line should result in formation of 
the specific oligosaccharide structure. The resultant oligosaccharide can be identified using a specific 
antibody or lectin that recognizes the structure. The recipient cell line also must support replication of an 
appropriate plasmid vector. 

COS-1 cells initially appear to satisfy the requirements for using the transient expression method. COS- 

55 1 cells express SV40 large T antigen and support the replication of plasmid vectors harboring a SV40 
replication origin (Gluzman et aL, Cell 23:175-182 (1981)). Although COS-1 cells, themselves, express a 
variety of glycosyltransferases. COS-1 cells have been used to clone cDNA sequences encoding human 
blood group Lewis a1-3/4 fucosyltransferase and murine al-3 galactosyltransferase (Kukowska-Latatlo et 
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al., Genes and Devel. 4:1288-1303 (1990); Larsen et al., Proc. Natl. Acad. Sci. USA 86:8227-8231 (1989)). 
Also, Goelz et al., Cell 63:175-182 (1990), utilized an antibody that inhibits E-selectin mediated adhesion to 
isolate a cDNA sequence encoding a1 — 3 fucosyltransferase. 

An attempt was made to use COS-1 cells to isolate cDNA clones encoding core 2 £1—6 A/- 
5 acetylglucosaminyltransferase. COS-1 cells were transfected using cDNA obtained from activated human T 
cells, which express the core 2 01—6 A/-acetylglucosaminy transferase. Transfected cells suspected of 
expressing core 2 01—6 JV-acetylglucosaminyltransf erase in the transfected cells were identified by the 
presence of increased levels of the . core 2 oligosaccharide structure formed by core 2 01—6 N- 
acetylglucosaminyltransferase activity. The presence of the core 2 structure was identified using the 

to monoclonal antibody, T305, which identifies a hexasaccharide on leukosialin. A clone expressing high levels 
of the T305 antigen was isolated and sequenced. 

Surprisingly, transfection using COS-1 cells resulted in the isolation of a cDNA clone encoding a novel 
variant of human leukosialin. which is the acceptor molecule for core 2 01-6 ^acetylglucosaminyltrans- 
ferase activity. Examination of the cDNA sequence of the newly isolated leukosialin revealed the cDNA 

75 sequence was formed as a result of alternative splicing of exons in the genomic leukosialin DNA sequence. 
Specifically, the newly isolated leukosialin is encoded by cDNA sequence containing a previously un- 
described non-coding exon at the S'-terminus (exon V in Figure 2; SEQ. ID. NO. 1). 

The unexpected result obtained using COS-1 cells led to the development of a new transfection system 
to isolate a cDNA sequence encoding core 2 01-6 A/-acetyIglucosaminyItransferase. CHO cells, which do 

20 not normally express the T305 antigen, were transfected with DNA sequences encoding human leukosialin 
and the polyoma virus large T antigen. A ceil line, designated CHO-Py-leu, which expresses human 
leukosialin and polyoma virus large T antigen, was isolated. 

CHO-Py-leu cells were used for transient expression cloning of a cDNA sequence encoding core 2 
01—6 /^acetylglucosaminyltransferase. CHO-Py-leu cells were transfected with cDNA obtained from human 

25 HL-60 promyelocytes A plasmid, pcDNA!-C2Gnt, which directed expression of the T305 antigen, was 
isolated and the cDNA insert was sequenced (see Figure 5; SEQ. ID. NO. 3). The 2105 base pair cDNA 
sequence encodes a putative 428 amino acid protein (SEQ. ID. NO. 4). The genomic DNA sequence 
encoding can be isolated using methods well known to those skilled in the art, such as nucleic acid 
hybridization using the core 2 01—6 AZ-acetylg I ucosaminy (transferase cDNA disclosed herein to screen, for 

30 example, a genomic library prepared from HL-60 promyelocytes. 

An enzyme similar to the disclosed human core 2 01-6 A/-acetylglucosaminyltransferase has been 
purified from bovine tracheal epithelium (Ropp et al., J. Biol. Chem. 266:23863-23871 (1991), which is 
incorporated herein by reference. The apparent molecular weight of the bovine enzyme is ~69kDa. In 
comparison, the predicted molecular weight of the polypeptide portion of core 2 01-6 N-acetyl- 

35 glucosaminyltransferase is ~50kDa. The deduced amino acid sequence of core 2 01—6 /V-acetyl- 
glucosaminyltransferase reveals two to three potential N-glycosylation sites, suggesting A*-glycosylation and 
O-glycosylation, or other post-translational modification, could account for the larger apparent size of the 
bovine enzyme. 

Expression of the cloned C2GnT sequence, or a fragment thereof, directed formation of the specific O- 

40 glycan core 2 oligosaccharide structure. Although several cDNA sequences encoding glycosyltransferases 
have been isolated (Paulson and Colley, J. Biol. Chem. 264:17615-17618 (1989); Schachter, Curr. Opin. 
Struct. Biol. 1:755-765 (1991), which are incorporated herein by reference), C2GnT is the first reported 
cDNA sequence encoding an enzyme involved exclusively in O-glycan synthesis. 

In O-glycans, 01—6 A/-acetylglucosaminy I linkages may occur in both core 2, Gal01— 3(GlcNAc01— 6)- 

45 GalNAc, and core 4, GlcNAc01— 3(GlcNAc01-6)GalNAc, structures (Brockhausen et al.. Biochemistry 
24:1866-1874 (1985), which is incorporated herein by reference. In addition, 01—6 N-acetylglucosaminyl 
linkages occur in the side chains of poJy-AJ-acetyllactosamine, forming the l-structure (Piller et af. v J. Biol. 
Chem. 259:13385-13390 (1984), which is incorporated herein by reference), and in the side chain attached 
to a-mannose of the Af-glycan core structure, forming a tetraantennary saccharide (Cummings et al., J- Biol. 

so Chem. 257:13421-13427 (1982), which is incorporated herein by reference). The enzymes responsible for 
these linkages all share the unique property that Mn 2 + is not required for their activity. 

Although it was originally suggested that these 01—6 W-acetylglucosaminyl linkages were formed by 
the same enzyme (Piller at al., 1984). the present disclosure clearly demonstrates that the HL-60-derived 
core 2 01—6 A/-acetylglucosaminyltransferase is specific for the formation only of O-glycan core 2. This 

55 result is consistent with a recent report demonstrating that myeloid cell lysates contain the enzymatic 
activity associated with core 2, but not core 4, formation (Brockhausen et al., supra , (1991)). 

Analysis of mRNA isolated from colonic cancer cells indicated core 2 01—6 /^acetylglucosaminyltrans- 
ferase is expressed in these cells. Recent studies using affinity absorption suggested at least two different 

6 
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01-6 W-acetylglucosaminyltransferases were present in tracheal epithelium (Ropp et al., supra, (1991)). 
One of these transferases formed core 2. core 4, and I structures. Thus, at least one other 01-6 N- 
acetylglucosaminyltransferase present in epithelial cells can form core 2, core 4 and I structures. Similarly, 
a j81— 6 A/-acetylglucosaminyltransferase present in Novikoff hepatoma cells can form both core 2 and 
structures (Koenderman et al.. Eur. J. Biochem. 166:199-208 (1987). which is incorporated herein by 

The acceptor molecule specificity of core 2 01-6 A/-acetylglucosaminyltransferase is different from the 
specificity of the enzymes present in tracheal epithelium and Novikoff hepatoma cells. Thus, a family of 
01-6 /V-acetylglucosaminyltransf erases can exist, the members of which differ in acceptor specificity but 
are capable of forming the same linkage. Members of this family are isolated from cells expressing 01-6 
A^acetylglucosaminyltransferase activity, using, for example, nucleic acid hybridization assays and studies 
of acceptor molecule specificity. Such a family was reported for the «1-3 fucosyltransferases (Weston et 
al J Biol Chem 267:4152-4160 (1992). which is incorporated herein by reference). 

' The formation of the core 2 structure is critical to cell structure and function. For example, the core 2 
structure is essential for elongation of poly-/v-acetyllactosamine and for formation of sialyl Le or sialyl Le a 
structures. Furthermore, the biosynthesis of cartilage keratan sulfate may be initiated by the core 2 01-6 
W-acetylglucosaminyltransferase, since the keratan sulfate chain is extended from a branch present in core 
2 structure in the same way as poly-rV-acetyllactosamine (Dickenson et al.. Biochem. J. 269:55-59 (1990). 
which is incorporated herein by reference). Keratan sulfate is absent in wild-type CHO cells which do not 
express the core 2 01-6 W-acetylglucosaminyltransferase (Esko et al.. J. Biol. Chem. 261:15725-15733 
(1986). which is incorporated herein by reference). These structures are believed to be important for cellular 
recognition and matrix formation. The availability of the cDNA clone encoding the core 2 01-6 N— 
acetylglucosaminyltransferase will aid in understanding how the various carbohydrate structures are formed 
during differentiation and malignancy. Manipulation of the expression of the various carbohydrate structures 
25 by gene transfer and gene inactivation methods will help elucidate the various functions of these structures 
The present invention is directed to a method for transient expression cloning in CHO cells of cDNA 
sequences encoding proteins having enzymatic activity. Isolation of human core 2 01-6 N-acetyl- 
glucosaminyltransferase is provided as an example of the disclosed method. However, the method can be 
used to obtain cDNA sequences encoding other proteins having enzymatic activity. 
so For example, lectins and antibodies reactive with other specific oligosaccharide structures are available 
and can be used to screen for glycosyltransferase activity. Also. CHO cell lines that have defects in 
glycosylation have been isolated. These cell lines can be used to study the activity of the corresponding 
glycosyltransferase (Stanley. Ann. Rev. Genet 18:525-552 (1984). which is incorporated herein by refer- 
ence) CHO cell lines also have been selected for various defects in cellular metabolism, loss of expression 
of cell surface molecules and resistance to cytotoxic drugs (see. for example. Malmstrom and Krieger J. 
Biol Chem 266:24025-24030 (1991); Yayon et al.. Cell 64:841-848 (1991). which are incorporated herein by 
reference). The approach disclosed herein should allow isolation of cDNA sequences encoding the proteins 
involved in these various cellular functions. 

As used herein, the terms "purified" and "isolated" mean that the molecule or compound is substan- 
tially free of contaminants normally associated with a native or natural environment. For example, a purified 
protein can be obtained from a number of methods. The naturally-occurring protein can be purified by any 
means known in the art. including, for example, by affinity purification with antibodies having specific 
reactivity with the protein. In this regard, anti-core 2 01-6 ^acetylglucosaminyltransferase antibodies can 
be used to substantially purify naturally-occurring core 2 01-6 /^acetylglucosaminyltransferase from 

45 human HL-60 promyelocytes. ... ....... . .. ..: •:. ; 

Alternatively, a purified protein of the present invention can be obtained by well known recombinant 
methods, utilizing the nucleic acids disclosed herein, as described, for example, in Sambrook et aL 
Mole cular Cloning: A Laboratory Manual 2d ed. (Cold Spring Harbor Laboratory 1989). which is incorporated 
herein by reference, and by the methods described in the Examples below. Furthermore, purified proteins 
so can be synthesized by methods well known in the art. 

As used herein, the phrase "substantially the sequence" includes the described nucleotide or ammo 
acid sequence and sequences having one or more additions, deletions or substitutions that do not 
substantially affect the ability of the sequence to encode a protein have a desired functional activity. In 
addition, the phrase encompasses any additional sequence that hybridizes to the disclosed sequence under 
ss stringent hybridization sequences. Methods of hybridization are well known to those skilled in the art. For 
example, sequence modifications that do not substantially alter such activity are intended. Thus, a pratoin 
having substantially the amino acid sequence of Figure 5 (SEQ. ID. NO. 4) refers to core 2 01-6 N- 
acetylglucosaminyltransferase encoded by the cDNA described in Example IV. as well as proteins having 
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amino acid sequences that are modified but, nevertheless, retain the functions of core 2 01-6 N- 
acetylglucosaminyltransferase. One skilled in the art can readily determine such retention of function 
following the guidance set forth, for example, in Examples V and VI. 

The present invention is further directed to active fragments of the human core 2 £1-6 N- 
acetylglucosaminyltransferase protein. As used herein, an active fragment refers to portions of the prote.n 
fhat substantially retain the glycosy.transferase activity of the intact core 2 01-6 W-acetylglucosammyltrans^ 
ferase protein. One skilled in the art can readily identify active fragments of proteins such as core 2 /M-6 
N-acetylglucosaminyltransferase by comparing the activities of a selected fragment with the .ntact prote.n 
following the guidance set forth in the Examples below. ,„ oforaoo trt 

As used herein, the term "glycosyltransferase activity" refers to the function of a glycosyltransferase to 
link sugar residues together through a glycosidic bond to create critical branches in oligosaccharides 
Glycosyltransferase activity results in the specific transfer of a monosaccharide to an appropriate acceptor 
molecule, such that the acceptor molecule contains oligosaccharides having critical branches. One , sk, led in 
the art would understand the terms "enzymatic activity" and "catalytic activity" to generally refer to a 
function of certain proteins, such as the function of those proteins having glycosyltransferase activity. 

As used herein, the term "acceptor molecule" refers to a molecule that is acted upon by a protein 
having enzymatic activity. For example, an acceptor molecule, such as leukosial.n, as identified by the 
amino add sequence of Rgure 2 (SEQ. ID. NO. 2). accepts the transfer of a monosaccharide due to 
glycosyltransferase activity. An acceptor molecule, such as leukosialin. may already contain one or more 
sugar residues. The transfer of monosaccharides to an acceptor molecule, such as leukosial.n, results m the 
formation of critical branches of oligosaccharides. e ^i«- 

As used herein, the term "critical branches" refers to oligosacchande structures formed by specific 
qlycosyltransferase activity. Critical branches may be involved in various cellular functions such as cell-ceH 
Scognrtion. The oligosaccharide structure of a critical branch can be determined using methods well known 
in the art. such as the method for determining the core 2 oligosaccharide structure, as described m 

^Relaredly."^! 1 ' invention also provides nucleic acids encoding the human core 2 01-6 ^-acetyl- 
glucosaminyltransferase protein and leukosialin protein described above. The nucle.c acids can , be in the 
form of DNA. RNA or cDNA, such as the novel C2GnT cDNA of 2105 base pairs identified in Figure 5 
SEQ ID NO. 3) or the novel leukosialin cDNA identified in Rgure 2 (SEQ. ID. NO. 1), for example Such 
nucleic acids can also be chemically synthesized by methods known in the art. including, for example, the 
use of an automated nucleic acid synthesizer. «=„,,„, * ,c m 

The nucleic acid can have substantially the nucleotide sequence of C2GnT. identified in Figure ^5 (SEQ. 
ID NO. 3), or leukosialin identified in Figure 2 (SEQ. ID. NO. 1). Portions of such nucleic ac.ds that encode 
active fragments of the core 2 01-6 W-acetylglucosaminyltransferase protein or leukosialin protein of the 
present invention also are contemplated. imH „ 

Nucleic acid probes capable of hybridizing to the nucleic adds of the present invention under 
reasonably stringent conditions can be prepared from the cloned sequences or by syMhHBing 
oligonucleotides by methods known in the art. The probes can be labeled with markers according to 
methods known in the art and used to detect the nucleic acids of the present invention. Methods fo 
detecting such nucleic acids can be accomplished by contacting the probe with a sample containing or 
suspected of containing the nucleic acid under hybridizing conditions, and detecting the hybridization of the 

probe to the nucleic acid. _ 

The present invention is further directed to vectors containing the nucleic acids described above. The 
term "vector" includes vectors that are capable of expressing nucleic acid sequences operably linked to 
regulatory sequences capable of effecting their expression. Numerous cloning vectors are known in the art. 
Thus, the selection of an appropriate cloning vector is a matter of choice. In general, useful vectors for 
recombinant DNA are often plasmids. which refer to circular double stranded DNA loops such as pcDNAI or 
pcDSRcr. As used herein, "plasmid" and "vector" may be used interchangeably as the plasmid is a 
common form of a vector. However, the invention is intended to include other forms of expression vectors 
that serve equivalent functions. 

Suitable host cells containing the vectors of the present invention are also provided. Host cells can De 
transformed with a vector and used to express the desired recombinant or fusion prote.n. Methods of 
recombinant expression in a variety of host cells, such as mammalian, yeast, insect or bacterial cells are 
widely known. For example, a nucleic acid encoding core 2 01-6 W-acetylglucosaminyltransferase or a 
nucleic acid encoding leukosialin can be transfected into cells using the calcium phosphate technique or 
other transfedion methods, such as those described in Sambrook et al.. supra . (1989). 
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Alternatively, nucleic acids can be introduced into cells by infection with a retrovirus carrying the gene 
or genes of interest. For example, the gene can be cloned into a plasmid containing retroviral long terminal 
repeat sequences, the C2Gnt DNA sequence or the leukosialin DNA sequence, and an antibiotic resistance 
qene for selection. The construct can then be transfected into a suitable cell line, such as PA12 which 

s carries a packaging deficient provirus and expresses the necessary components for virus production, 
including synthesis of amphotrophic glycoproteins. The supernatant from these cells conta.n infectious 
virus, which can be used to infect the cells of interest. ^^ art hrtC . ~|. Q 

Isolated recombinant polypeptides or proteins can be obtained by growing the described host cells 
under conditions that favor transcription and translation of the transfected nucleic acid. Recombinant 

io proteins produced by the transfected host cells are isolated using methods set forth herein and by methods 

well known to those skilled in the art. „..u,.,„<. 

Also provided are antibodies having specific reactivity with the core 2 01-6 W-acetylglucosam.nyltrans- 
ferase protein or leukosialin protein of the present invention. Active fragments of antibodies for example. 
Fab and Fab' 2 fragments, having specific reactivity with such proteins are intended to fall within the 
J5 definition of an "antibody." Antibodies exhibiting a titer of at least about 1.5 x 10*. as determined by ELISA. 
are useful in the present invention. . 

The antibodies of the invention can be produced by any method known in the art For example, 
polyclonal and monoclonal antibodies can be produced by methods described in Harlow and Lane 
Antibodies: A Laboratory Manual (Cold Spring Harbor 1988). which is incorporated herein by reference The 
proteins, particul arly core 2 01-6 A/-acetylglucosaminyltransferase or leukosialin of the present invention 
can be used as immunogens to generate such antibodies. Altered antibodies, such as chimenc. [™™'» d - 
CDR-grafted or Afunctional antibodies can also be produced by methods well known to those skilled in the 
art. Such antibodies can also be produced by hybridoma. chemical synthesis or recombinant methods 
described, for example, in Sambrook et al.. sup_ra, (1989). _ 

The antibodies can be used for determining the presence or punfication of the core 2 01 6 N- 
acetylglucosaminyltransferase protein or the leukosialin protein of the present invention. With respect to the 
detecting of such proteins, the antibodies can be used for in vitro or in vivo methods well known to those 

Sk '" RnTlly^kitruseful for carrying out the methods of the invention are also provided. The kits can contain 
a core 2 01-6 A^acetylglucosaminyltransferase protein, antibody or nucleic acid of the present invention 
and an ancillary reagent. Alternatively, the kit can contain a leukosialin protein, antibody or nucleic acid o 
the present invention and an ancillary reagent. An ancillary reagent may include diagnostic agents, ^signal 
detection systems, buffers, stabilizers, pharmaceutical* acceptable carriers or other reagents and matenals 

conventionally included in such kits. . 

A cDNA sequence encoding core 2 01-6 /V-acetylglucosaminyltransferase was isolated and core 2 
01-6 /V-acetylglucosaminyltransferase activity was determined. This is the first report of transient expres- 
sion cloning using CHO cells expressing polyoma large T antigen. The following examples are intended to 
illustrate but not limit the present invention. 

ao EXAMPLE I 

EXPRESSION CLONING IN COS-1 CELLS OF THE cDNA FOR THE PROTEIN CARRYING THE HEX- 
ASACCHARIDES 

COS-1 -cells were transfected with a cDNA library. pcDSRa-2F1. constructed from polyjA)* RNA .pf„. 
activated T lymphocytes, which express the core 2 01-6 W-acetylglucosaminyltransferase (Yokota et al.. 
Proc Natl. A cad. Sci. USA 83:5894-5898 (1986); Piller et al.. supra. (1988). which are .ncorporated herein 
by reference). COS-1 cells support replication of the pcDSRa constructs, which contain the SV40 replication 
origin. Transfected cells were selected by panning using monoclonal antibody T305. ^ rec ^™»* 
sialylated branched hexasaccharides (Piller et al.. supra . (1991); Saitoh et al.. supra. (1991)). Methods 
referred to in this example are described in greater detail in the examples that follow. 

Following several rounds of transfection. one plasmid. pcDSRa-leu. directing high expression of the 
T305 antigen was identified. The cloned cDNA insert was isolated and sequenced, then compared with 
other reported sequences. The newly isolated cDNA sequence was nearly identical to the sequence 
55 reported for leukosialin. except the S'-flanking sequences were different (Pallant et al.. Proc. Natl. Acad. bci. 
USA 86:1328-1332 (1989). which is incorporated herein by reference). 

Comparison of the cloned cDNA sequence with the genomic leukosialin DNA sequence revealed tne 

start site of the cDNA sequence is located 259 bp upstream of the transcription start site of the previously 
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reported sequence (Figure 2; compare Exon V and Exon 1) (Shelley et al., Biochem. J. 270:569-576 (1990); 
Kudo and Fukuda. J. Biol. Chem. 266:8483-8489 (1991), which are incorporated herein by reference). A 
consensus splice site was identified at the exon-intron junction of the newly identified 122 bp exon V in 
pcDSRa-Ieu (Breathnach and Chambon, Ann. Rev. Biochem. 50:349-383 (1981). which is incorporated 

s herein by reference). This splice site is followed by the exon 2 sequence. 

These results indicate the T305 antibody preferentially binds to branched hexasaccharides attached to 
leukosialin. Indeed, a small amount of the hexasaccharides (approximately 8% of the total) was detected in 
O-glycans isolated from control COS-1 cells. T305 binding is similar to anti-M and anti-N antibodies, which 
recognize both the glycan and polypeptide portions of erythrocyte glycoprotein, glycophorin (Sadler et al., 

w J. Biol. Chem 254: 2112-2119 (1979), which is incorporated herein by reference). These observations are 
consistent with reports that only leukosialin strongly reacted with T305 in Western blots of leukocyte cell 
extracts, even though leukocytes also express other glycoproteins, such as CD45. that must also contain 
the same hexasaccharides (Piller et al., supra , (1991); Saitoh et al., supra , (1991)). 

15 EXAMPLE II 

ESTABLISHMENT OF CHO CELL LINES THAT STABLY EXPRESS POLYOMA VIRUS LARGE T ANTI- 
GEN AND LEUKOSIALIN 

20 T305 preferentially binds to branched hexasaccharides attached to leukosialin. Such hexasaccharides 
are not present on the erythropoietin glycoprotein produced in CHO cells, although the glycoprotein does 
contain the precursor tetrasaccharide (Sasaki et al.. J. Biol. Chem. 262:12059-12076 (1987), which is 
incorporated herein by reference). T305 antigen also is not detectable in CHO cells transiently transfected 
with pcDSRa-Ieu. In order to screen for the presence of a cDNA clone expressing core 2 /31-*6 N- 

25 acetylglucosaminyltransferase activity, a CHO cell line expressing both leukosialin and polyoma large T 
antigen was established (see, for example, Heffernan and Dennis Nucl. Acids Res. 19:85-92 (1991), which is 
incorporated herein by reference). 

Vectors: A plasmid vector, pPSVE1-PyE, which contains the polyoma virus early genes under the control 
of the SV40 early promoter, was constructed using a modification of the method of Muller et al., Mol. Cell. 

30 Biol. 4:2406-2412 (1984), which is incorporated herein by reference. Plasmid pPSVEl was prepared using 
pPSG4 (American Type Culture Collection 37337) and SV40 viral DNA (Bethesda Research laboratories) 
essentially as described by Featherstone et al., Nucl. Acids Res. 12:7235-7249 (1984), which is incor- 
porated herein by reference. Following EcoRI and Hindi digestion of plasmid pPyLT-1 (American Type 
Culture Collection 41043), a DNA sequence containing the carboxy terminal coding region of polyoma virus 

35 large T antigen was isolated. The Hindi site was converted to an EcoRI site by blunt-end ligation of 
phosphorylated EcoRI linkers (Stratagene). Plasmid pPSVE1-PyE was generated by inserting the carboxy- 
terminal coding sequence for large T antigen into the unique EcoRI site of plasmid pPSVEl. 

Plasmid pZIPNEO-Ieu was constructed by introducing the EcoRI fragment of PEER-3 cDNA, which 
contains the complete coding sequence for human leukosialin, into the unique EcoRI site of plasmid 

40 pZIPNEO (Cepko et al.. Cell 37:1053-1063 (1984), which is incorporated herein by reference). Plasmid 
structures were confirmed by restriction mapping and by sequencing the construction sites. pZIPNEO was 
kindly provided by Dr. Channing Der. 

Transfectlon: CHODG44 cells were grown in 100 mm tissue culture plates. When the cells were 20% 
confluent, they were co-transfected with a 1:4 molar ratio of pZIPNEO-Ieu and pPSVEl -PyE using the 
45 calcium phosphate technique (Graham and van der Eb, Virology 52:456-467 (1973), which is incorporated 
herein by reference). Transfected cells were isolated and maintained in medium containing 400 ug/ml G- 
418 (active drug). 

Leukosialin expression: The total pool of G418-resistant transfectants was enriched for human leukosialin 
expressing cells by a one-step panning procedure using anti-leukosialin antibodies and goat anti-rabbit IgG 
so coated panning dishes (Sigma) (Carlsson and Fukuda J. Biol. Chem. 261:12779-12786 (1986), which is 
incorporated herein by reference). Clonal cell lines were obtained by limiting dilution. Six clonal cell lines 
expressing human leukosialin on the cell surface were identified by indirect immunofluorescence and 
isolated for further studies (Williams and Fukuda J. Cell Biol. 111:955-966 (1990), which is incorporated 
herein by reference). 

5 5 Polyoma virus-mediated replication: The ability of the six clonal cell lines to support polyoma virus large 
T antigen-mediated replication of plasmids was assessed by determining the methylation status of 
transfected plasmids containing a polyoma virus origin of replication (Muller at al., supra , 1984; Heffernan 
and Dennis, supra , 1991). Plasmid pGT/hCG contains a fused 01—4 galactosyltransferase and human 
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chorionic gonadotropin a-chain DNA sequence inserted in plasmid pcDNAI, which contains a polyoma virus 
replication origin (Aoki et al., Proc. Natl. Acad. Sci., USA 89, 4319-4323 (1992), which is incorporated herein 
by reference). 

Plasmid pGT/hCG was isolated from methylase-positive E. coli strain MC1061/P3 (Invitrogen), which 
s methylates the adenine residues in the Dpnl recognition site. "GATC". The methylated Dpnl recognition site 
is susceptible to cleavage by Dpnl. In contrast, the Dpnl recognition site of plasmids replicated in 
mammalian cells is not methylated and. therefore, is resistant to Dpnl digestion. 

Methylated plasmid pGT/hCG was transfected by lipofection into each of the six selected clonal cell 
lines expressing leukosialin. After 64 nr. low molecular weight plasmid DNA was isolated from the cells 
70 using the method of Hirt. J. Mol. Biol. 26:365-369 (1967). which is incorporated herein by reference. Isolated 
plasmid DNA was digested with Xhol and Dpnl (Stratagene). subjected to electrophoresis in a 1% agarose 
gel. and transferred to nylon membranes (Micron Separations Inc., MA). 

A 0.4 kb Smal fragment of the 01-^4 galactosy transferase DNA sequence of pGT/hCG was radiolabel- 
ed with [ 32 P]dCTP using the random primer method (Feinberg and Vogelstein, Anal. Biochem. 132:6-13 
75 (1983), which is incorporated herein by reference). Hybridization was performed using methods well-known 
to those skilled in the art (see. for example, Sambrook et al. t supra , (1989)). Following hybridization, the 
membranes were washed several times, including a final high stringency wash in 0.1 x SSPE. 0.1% SDS for 
, 1 hr at 65 • C, then exposed to Kodak X-AR film at -70 • C. 

Four of the six clones tested supported replication of the pcDNAI-based plasmid, pGT/hCG (Fig. 3.A., 
20 lanes 1. 3, 4 and 5). MOP-8 cells, a 3T3 cell line transformed by polyoma virus early genes (Muller et al., 
supra , (1984)), expresses endogenous core 2 01— 6 W-acetylglucosaminyltransferase activity and was used 
aTa* control for the replication assay (Fig. 3.B., lane 1). One clonal cell line that supported pGT/hCG 
replication, CHO-Py-leu (Fig. 3.A., lane 5; Fig. 3.B., lanes 2 and 3) and expressed a significant amount of 
leukosialin, was selected for further studies. pGT/hCG was kindly provided by Dr. Michiko Fukuda. 

25 

EXAMPLE III 

ISOLATION OF A cDNA SEQUENCE DIRECTING EXPRESSION OF THE HEXASACCH ABIDE ON 
LEUKOSIALIN 

30 

Poly(A) + RNA was isolated from HL-60 promyelocytes, which contain a significant amount of the core 2 
01-*6W-acetylgIucosaminyltransferase (Saitoh et al.. supra , (1991)). A cDNA expression library, pcDNAI-HL- 
60. was prepared (Invitrogen) and the library was screened for clones directing the expression of the T305 
antigen. 

35 Plasmid DNA from the pcDNAI-HL-60 cDNA library was transfected into CHO-Py-leu cells using a 
modification of the lipofection procedure, described below (Feigner et al., Proc. Natl. Acad. Sci. USA 
84:7413-7417 (1987), which is incorporated herein by reference). CHO-Py-leu cells were grown in 100 mm 
' tissue culture plates. When the cells were 20% confluent, they were washed twice with Opti-MEM I 

(GIBCO). Fifty ug of lipofectin reagent (Bethesda Research Laboratories) and 20 ug of purified plasmid 

40 DNA were each diluted to 1.5 ml with Opti-MEM I, then mixed and added to the cells. After incubation for 6 
hr at 37 *C, the medium was removed, 10 ml of complete medium was added and incubation was continued 
for 16 hr at 37 • C. The medium was then replaced with 10 ml of fresh medium. 

Following a 64 hr period to allow transient expression of the transfected plasmids. the cells were 
detached in PBS/5mM EDTA. pH7.4, for 30 min at 37 -C, pooled, centrifuged and resuspended in cold 

45 PBS/10mM EDTA/5% fetal calf serum, pH7.4, containing a 1:200 dilution of ascites fluid containing T305 
monoclonal antibody. The cells were incubated on ice for 1 hr, then washed in the same buffer and panned 
on dishes coated with goat anti-mouse IgG (Sigma) (Wysocki and Sato Proc. Natl. A cad. Sci. USA 75:2844- 
2848 (1978); Seed & Aruffo Proc. Natl. Acad. Sci. USA 84:3365-3369 (1987), which are incorporated herein 
by reference). T305 monoclonal antibody was kindly provided by Dr. R.I. Fox, Scripps Research Founda- 

50 tion, La Jolla, CA. 

Plasmid DNA was recovered from adherent cells by the method of Hirt, supra , (1967), treated with Dpnl 
to eliminate plasmids that had not replicated in transfected cells, and transformed into E. coli strain 
MC1061/P3. Plasmid DNA was then recovered and subjected to a second round of screening. E. coli 
transformants containing plasmids recovered from this second enrichment were plated to yield 8 pools of 
55 approximately 500 colonies each. Replica plates were prepared using methods well-known to those skilled 
in the art (see, for example, Sambrook et al., supra , (1989)). 

The pooled plasmid DNA was prepared from replica plates and transfected into CHO-Py-leu ceils. The 
transfectants were screened by panning. One plasmid pool was selected and subjected to three subsequent 
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rounds of selection. One plasmid, pcDNAI-C2GnT, which directed the expression of the T305 antigen, was 
isolated. CHO-Py-leu cells transfected with pcDNAI-C2GnT express the antigen recognized by T305, 
whereas CHO-Py-leu cells transfected with pcDNAI are negative for T305 antigen (Fig. 4). These results 
show pcDNAI-C2GnT directs the expression of a new determinant on leukosialin that is recognized by T305 
s monoclonal antibody. This determinant is the branched hexasaccharide sequence, NeuNAca2-^3Gal01-*3- 
(NeuNAca2-*3Gal01->4 GlcNAcj91-6)GalNAc. 

EXAMPLE IV 

70 CHARACTERIZATION OF C2GnT 

DNA sequence: The cDNA insert in plasmid pcDNAI-C2GnT was sequenced by the dideoxy chain 
termination method using Sequenase version 2 reagents (United States Biochemicals) (Sanger et al., Proc. 
Na tl Acad. Sci. USA 74:5463-5467 (1977), which is incorporated herein by reference). Both strands were 
sequenced using 17-mer synthetic oligonucleotides, which were synthesized as the sequence of the cDNA 
insert became known. 

Plasmid pcDNAI-C2GnT contains a 2105 base pair insert (Fig. 5). The cDNA sequence (SEQ. ID. NO. 3) 
ends 1878 bp downstream of the putative translation start site. A poiyadenylation signal is present at 
nucleotides 1694-1699. The significance of the large number of nucleotides between the poiyadenylation 
signal and the beginning of the polyadenyl chain is not clear. However, this sequence is A/T rich. 
D educed amino acid sequence: The cDNA insert in plasmid pcDNAI-C2GnT encodes a single open 
reading frame in the sense orientation with respect to the pcDNAI promoter (Fig. 5). The open reading 
frame encodes a putative 428 amino acid protein having a molecular mass of 49,790 daltons. 

Hydropathy analysis indicates the predicted protein is a type II transmembrane molecule, as are all 
previously reported mammalian glycosyltransf erases (Schachter, supra , (1991)). In this topology, a nine 
amino acid cytoplasmic NH 2 -terminal segment is followed by a 23 amino acid transmembrane domain 
flanked by basic amino acid residues. The large COOH-terminus consists of the stem and catalytic domains 
and presumably faces the lumen of the Golgi complex. 

The putative protein contains three potential W-glycosylation sites (Fig. 5, asterisks). However, one of 
these sites contains a proline residue adjacent to asparagine and is not likely utilized in vivo. 

No matches were obtained when the C2GnT cDNA sequence and deduced amino acid sequence were 
compared with sequences listed in the PC/Gene 6.6 data bank. In particular, no homology was revealed 
between the deduced amino acid sequence of C2GnT and other glycosyltransferases, including N- 
acetylglucosaminyltransferase I (Sarkar et al., Proc. Natl. Acad. Sci. USA 88:234-238 (1991). which is 
35 incorporated herein by reference). 

mRNA expression: Poly(A) + RNA was prepared using a kit (Stratagene) and resolved by electrophoresis 
on a 1.2% agarose/2.2 M formaldehyde gel, and transferred to nylon membranes (Micro Separations Inc.. 
MA) using methods well-known to those skilled in the art (see. for example, Sambrook et al.. supra. (1989)). 
Membranes were probed using the EcoRI insert of pPROTA-C2GnT (see below) radiolabeled with pP]- 
dCTP by the random priming method (Feinberg and Vogelstein, supra , (1983). Hybridization was performed 
in buffers containing 50% formamide for 24 hr at 42 -C (Sambrook et al., supra , (1989)). Following 
hybridization, filters were washed several times in 1xSSPE/0.1% SDS at room temperature and once in 
0.1xSSPE/0.1% SDS at 42 • C, then exposed to Kodak X-AR film at -70 • C. 

Fig. 6 compares the level of core 2 >S1-*6 A^acetylglucosaminyltransferase mRNA isolated from HL-60 
promyelocytes, K562 erythroleukemia cells., and poorly metastatic :SP and highly metastatic L4 colonic 
carcinoma cells. The major RNA species migrates at a size essentially identical to the -2.1 kb C2GnT 
cDNA sequence. The same result is observed for HL-60 cells and the two colonic cell lines, which 
apparently synthesize the hexasaccharides. In addition, two transcripts of -3.3 kb and 5.4 kb in size were 
detected in these cell lines. The two larger transcripts may result from differential usage of poiyadenylation 
so signals. 

No hybridization occurred with poly(A) + RNA isolated from K562 cells, which lack the hexasaccharide. 
but synthesize the tetrasaccharide (Carlsson et al., supra , (1986)), which is incorporated herein by 
reference. Similarly, no hybridization was observed for poly(A) + RNA isolated from CHO-Py-leu cells (Fig. 6, 
lane 1). 
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EXAMPLE V 



EXPRESSION OF ENZYMAT1CALLY ACTIVE j»-6 N — ACETYLGLUCOS AMI NYLTRANSFER ASE 

in order to confirm that C2GnT cDNA encodes for core 2 ,31-6 N-acetylglucosaminyltransferase, 

rpii qiioematant or laG-Sepharose matrix in a total reaction volume of 50 ui. 

Reac«ons we^e ^ incSed for 1 hr at 37-C. then processed by C18 Sep-Pak chromatography (Waters) 
,s <Pa.dc el a" T B?o Chem. 265:6759-6769 (1990), which is incorporated herein by reference) Core 2 and 
core 4 ^6 ^acety.g.ucosa miny.transferase were assayed using the acceptors p-nitropheny. 

Bio. Chem 267f299t2999 (1992), which is incorporated herein by reference). Synthefc acceptors were 
MnHiv nmvidsd bv Dr Ole Hindsgaul, University of Alberta, Canada. 
25 hrt SS?3 th'ese assays a?e shown in Table I. Assuming transfection «^J^j£ T J 
approximately 20-30%. the level of enzymatic activity directed by cells transfected with pcDNAI-C2GnT ,s 
roughly equivalent to the level observed in HL-60 cells. 
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in order to unequivocally establish that C2GnT cDNA sequence encodes core 2 /J1-6 N-acetyl- 
glucosaminyltransferase. p.asmid. pPROTA-C2GnT was constructed containing the DNA J*"* 
L the putative catalytic domain of core 2 ,31-6 W-acetylglucosam.nyltransferase fused in frame with the 
ig'nal peptide and IgG binding domain of S. aureus protein A (Fig. 7). The putat.ve catalytK ' *>m™ « 
contained in a 1330 bp fragment of the C2GnT cDNA that encodes amino ac.d residues 38 to 428. Plasmid 
dPROTA was kindly provided by Dr. John B. Lowe. 

The polymerase chain reaction (PCR) was used to insert EcoRI recognition sites on either side of the 
1330 bp sequence in pcDNAI-C2GnT DNA. PCR was performed using the synthetic oligonucleotide primers 
S'-TTTGAATTCCCCTGAATTTGTAAGTGTCAGACAC-S 1 (SEQ. ID. NO. 5) and 5- 
TTTGAATTCGCAGAAACCATGCAGCTTCTCTGA-3' (SEQ. ID. NO. 6) (EcoRI recognition sites underlined^ 
The EcoRI s ites allowed direct, in-frame insertion of the fragment into the unique EcoRI site of plasm.d 
pPROTA (Sanchez-Lopez et al.. J. Biol. Chem. 263:11892-11899 (1988), which is mcorporated here.n by 

re,e Th n e Ce nucleotide sequence of the insert as well as the proper orientat ion were 

sequencing using the primers described above for cDNA sequencing. Plasm.d pPROTA-C2GnT allows 

seaeTon of the fusion protein from transfected ce.ls and binding of the secreted fus.on protem by 

^^p'S^^E^nT was transfected into COS-1 ce.ls. Fol.owing a 64 hr period to a.low 
transient expression, cel. supernatants were collected (Kukowska-Latallo et al.. supra. (1990» ^ Cel super- 
natants were cleared by centrifugation. adjusted to 0.05% Tween 20 and either assayed directly for core 2 
^1-6 A^acetylglucosaminyltransferase activity or used in IgG-Sepharose (Pharmacia) brndms jsUid.es. For 
the latter assay, supernatants (10 ml) were incubated batchwise with approximate y 300 ul o IgG- 
Sepharose for 4 hr at 4-C. The matrices were then extensively washed and used directly for glycosyltrans- 

feraS No a core2 *1-6 A/-acetylglucosaminyltransferase activity was detected in the medium of COS-1 cells 
transfected with the control plasmid. pPROTA. Similarly, no enzymatic activity was assoc.ated w.th IgG- 
Sepharose beads. In contrast, a significant level of core 2 W-acetyJglucosaminy.transfe acfcvity 
was detected in the medium of COS-1 ce.ls transfected with P PROTA-C2GnT. The activrty also assorted 
with the IgG-Sepharose beads (Table II). No activity was detected in the supernatant following incubation of 
the supernatant with IgG-Sepharose. 
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TABLE II 



Determination of Enzymatic Activities Directed by 
pPROTA-C2GnT . 



Radioac t ivi ty ( cpm ) 
Acceptors and with ( + ) and without 

linkages formed (-) acceptor 



GlcNAcBl 

6 

Galfll-*3GalNAc 109 1048 

{core 2-GnT) 



GlcNAcBl 

6 

GlcNAcfll-*3GalNAc 111 113 

(core 4-GnT) 



GlcNAcBl 

6 

GlcNAcfil-*2Man 118 H5 

(GnTV) 



GlcNAcBl 

6 

GlcNAcfl l-*3Gal 111 1 1 3 

(I-GnT) 



GlcNAcBl 

6 

GalAl-*4GlcNAcJll-*3Gal 99 96 

(I-GnT) 



COS- 1 cells were transfected with pPR0TA-C2GnT and the 
conditioned media were incubated with IgG-Sepharose. The 
proteins bound to the IgG-Sepharose were assayed for fll-*o 
N-acetylglucosaminyltransf erase activity by using 
appropriate acceptors. The linkages formed are indicated 
by italics. Similar results were obtained in three 
independent experiments. 
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EXAMPLE VI 

DETERMINATION OF C2GnT SPECIFICITY 

5 Four types of 01-6 A/-acetylglucosaminyltransferase linkages have been reported, including core 2 and 

core 4 in O-glycans, l-antigen and a branch attached to mannose that forms tetraantennary AAglycans (see 
Table II). In order to determine whether these different structures are also synthesized by the cloned C2GnT 
cDNA sequence, enzymatic activity was determined using five different acceptors. 

As shown in Table II, the fusion protein was only active with the acceptor for core 2 formation. The 

w same was true when the formation of 01-6 AZ-acetylglucosaminyl linkage to internal galactose residues was 
examined (Table II. see structure at bottom). This result precludes the likelihood that the enzyme encoded 
by the C2GnT cDNA sequence may add N-acetylglucosamine to a non-reducing terminal galactose. The 
HL-60 core 2 01-6 N-acetylglucosaminyltransferase is exclusively responsible for the formation of the 
GlcNAc01— 6 branch on Gal01— 3 GalNAc. 

15 Although the invention has been described with reference to the disclosed embodiments, it should be 
understood that various modifications can be made without departing from the spirit of the invention. 
Accordingly, the invention is limited only by the following claims. 
Lowe et al.. Cell 63:475-484 (1990) 
Brandley et al., Cell 63:861-863 (1990) 

20 Phillips et al.. Science 250:1130-1132 (1990) 
Walz et al.. Science 250:1132-1135 (1990) 
Higgins et al.. J. Biol. Chem. 266:6280-6290 (1991) 
Schachter. Biochem. Cell Biol. 64:163-181 (1986) 

25 



30 



35 



40 



45 



50 



55 



17 



BNSDOCIO:<EP 0590747A2 I > 



EP 0 590 747 A2 



SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: La Jolla Cancer Research Foundation 

(B) STREET: 10901 North Torrey Pines Road 

(C) CITY: La Jolla 

(D) STATE: California 

(E) COUNTRY: U.S.A. 

(F) POSTAL CODE (ZIP): 92037 



(ii) TITLE OF INVENTION: A NOVEL BETA1-6 

N-ACETYLGLUCOSAMINYL TRANSFERASE t ITS ACCEPTOR MOLECULE , 
LEUKOSIALIN AND A METHOD FOR CLONING PROTEINS HAVING 
15 ENZYMATIC ACTIVITY 



(iii) NUMBER OF SEQUENCES: 8 



(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 
20 (B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1,25 (EPO) 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 900 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 841,. 900 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 91,. 192 

(D) OTHER INFORMATION: /note- "EXON l'IS LOCATED IN BOTH 
GENOMIC AND cDNA. IN THE cDNA EXON 1« IS 
IMMEDIATELY FOLLOWED BY EXON 2." 



(ix) FEATURE: 

(A) NAME /KEY: exon 

(B) LOCATION: 359, ,428 

(D) OTHER INFORMATION: /note- 'EXON 1 IS LOCATED IN 
GENOMIC DNA" 



(ix) FEATURE: 

(A) NAME /KEY: intron 

(B) LOCATION: 193.. 806 

(D) OTHER INFORMATION: /note- "THIS SEGMENT OF NUCLEIC 
ACID CONSTITUTES INTRON SEQUENCE OF THE cDNA" 
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(ix) FEATURE: 

(A) NAME / KEY : exon 

(B) LOCATION: 807.. 900 

(D) OTHER INFORMATION: /note- "EXON 2 IS LOCATED IN BOTH 
GENOMIC AND cDNA. IN THE cDNA EXON 2 IMMEDIATELY 
5 FOLLOWS EXON 1 * . * 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



10 



20 



25 



30 



TTGGGGACCA 


CAAATGCAAA 


GGAAACCACC 


CTCCCCTCCC 


ACCTCCTCCT 


CTGCACCCTT 


60 


GAGTTCTCAG 


GCTCACATTC 


CCACCACCCA 


CCTCTGAGCC 


CAGCCCTCCC 


TAGCATCACC 


120 


ACTTCCATCC 


CATTCCTCAG 


CCAAGAGCCA 


GGAATCCTGA 


TTCCAGATCC 


CACGCTTCCC 


ISO 


TGCCTCCCTC 


AGGTGAGCCC 


CAGACCCCCA 


GGCACCCCGC 


TGGCCCCTGA 


AGGAGCAGGT 


240 


GATGGTGCTG 


TCTTCGCCCA 


GCAGCTGTGG 


GAGCAGGCGG 


GTGGGGCAGG 


ATGGAGGGGT 


300 


GGGTGGGGTG 


GGTGGAGCCA 


GGGCCCACTT 


CCTTTCCCCT 


TGGGGCCCTG 


TCCTTCCCAG 


360 


TCTTGCCCCA 


GCCTCGGGAG 


GTGGTGGAGT 


GACCTGGCCC 


CAGTGCTGCG 


TCCTTATCAG 


420 


CCGAGCCGGT 


AAGAGGGTGA 


GACTTGGTGG 


GGTAGGGGCC 


TCAGTGGGCC 


TGGGAATGTG 


480 


CCTGTGGCTT 


GAAAAGACTC 


TGACAGGTTA 


TGATGGGAAG 


AGATTGGGAG 


CCATTGGGCT 


540 


GCACAGGGTC 


AGGGAAGGCC 


AGGAGGGGCT 


GGTCACTGCT 


GGAATCTAAG 


CTGCTGAGGC 


600 


TGGAGGGAGC 


CTCAGGATGG 


GGCTGATGGG 


GGAGCTGCCA 


GCATCTGTTC 


CTCTGTCATT 


660 


TCTGATAACA 


GTAAAAGCCA 


GCATGGAAAA 


AACCGTTAAA 


CCGCAGGTtG 


GGCCTGGCCG 


720 


TTGGCAGGGA 


AG TGGGCAGA 


GGGGAGGCCC 


GGCCAGGTCC 


TCCGGCAACT 


CCCGCGTGTT 


780 


CTGCTTCTCC 


GGCTGCCCAC 


CTGCAGGTCC 


CAGCTCTTGC 


TCCTGCCTGT 


TTGCCTGGAA 


840 


ATG GCC ACG CTT CTC CTT CTC CTT GGG GTG CTG 
Met Ala Thr Leu Leu Leu Leu Leu Gly Val Leu 
1 5 10 


GTG GTA AGO CCA GAC. 
Val Val Ser Pro Asp 
15 


888 



GCT CTG GGG AGC 
Ala Leu Gly Ser 
35 20 



(2) INFORMATION FOR SEQ ID NO: 2: 

40 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

45 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

Met Ala Thr Leu Leu Leu Leu Leu Gly Val Leu Val Val Ser Pro Asp 
15 10 15 

Ala Leu Gly Ser 
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(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2105 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B ) LOCATION: 220.. 1504 

(ix) FEATURE: 

(A) NAME/KEY: polyA_signal 
75 (B) LOCATION: 1913.. 1918 

(ix) FEATURE: 

(A) NAME /KEY: misc_signal 

(B) LOCATION: 248 . . 314 

(D) OTHER INFORMATION: / standard_name- 
20 " SIGNAL /MEMBRANE- ANCHORING DOMAIN" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

GTGAAGTGCT CAGAATGGGG CAGGATGTCA CCTGGAATCA GCACTAAGTG ATTCAGACTT 60 

25 TCCTTACTTT TAAATGTGCT GCTCTTCATT TCAAGATGCC GTTGCAGCTC TGATAAATGC 120 

AAAC TGACAA CCTTCAAGGC CACGACGGAG GGAAAATCAT TGGTGCTTGG AGCATAGAAG 180 

ACTGCCCTTC ACAAAGGAAA TCCCTGATTA TTGTTTGAA ATG CTG AGG ACG TTG 234 

Met Leu Arg Thr Leu 

30 15 

CTG CGA AGG AGA CTT TTT TCT TAT CCC ACC AAA TAC TAC TTT ATG GTT 282 
Leu Arg Arg Arg Leu Phe Ser Tyr Pro Thr Lys Tyr Tyr Phe Met Val 
10 15 20 



35 



40 



CTT GTT TTA TCC CTA ATC ACC TTC TCC GTT TTA AGG ATT CAT CAA AAG 330 
Leu Val Leu Ser Leu lie Thr Phe Ser Val Leu Arg lie His Gin Lys 
25 30 35 

CCT GAA TTT GTA AGT GTC AGA CAC TTG GAG CTT GCT GGG GAG AAT CCT 378 
Pro Glu Phe Val Ser Val Arg His Leu Glu Leu Ala Gly Glu Asn Pro 
40 45 50 

AGT AGT GAT ATT AAT TGC ACC AAA GTT TTA CAG GGT GAT GTA AAT GAA 426 
Ser Ser Asp lie Asn Cys Thr Lys Val Leu Gin Gly Asp Val Asn Glu 
55 60 65 

ATC CAA AAG GTA AAG CTT GAG ATC CTA ACA GTG AAA TTT AAA AAG CGC 474 
45 He Gin Lys Val Lys Leu Glu lie Leu Thr Val Lys Phe Lys Lys Arg 

70 75 80 85 

CCT CGG TGG ACA CCT GAC GAC TAT ATA AAC ATG ACC AGT GAC TGT TCT 522 
Pro Arg Trp Thr Pro Asp Asp Tyr He Asn Met Thr Ser Asp Cys Ser 
90 95 100 

50 



55 



20 



BNSDOC1D: <EP 0590747A2 1 > 



EP 0 590 747 A2 



10 



TCT TTC ATC AAG AGA CGC AAA TAT ATT GTA GAA CCC CTT AGT AAA GAA 5 70 

Ser Phe He Lys Arg Arg Lys Tyr He Val Glu Pro Leu Ser Lys Glu 
105 HO 115 

GAG GCG GAG TTT CCA ATA GCA TAT TCT ATA GTG GTT CAT CAC AAG ATT 618 

Glu Ala Glu Phe Pro He Ala Tyr Ser He Val Val His His Lys He 
120 125 130 

GAA ATG CTT GAC AGG CTG CTG AGG GCC ATC TAT ATG CCT CAG AAT TTC 66 6 

Glu Met Leu Asp Arg Leu Leu Arg Ala He Tyr Met Pro Gin Asn Phe 

135 140 145 

TAT TGC GTT CAT GTG GAC ACA AAA TCC GAG GAT TCC TAT TTA GCT GCA 714 

Tvr Cys Val His Val Asp Thr Lys Ser Glu Asp Ser Tyr Leu Ala Ala 
150 155 160 165 

GTG ATG GGC ATC GCT TCC TGT TTT AGT AAT GTC TTT GTG GCC AGC CGA 762 

75 Val Met Gly He Ala Ser Cys Phe Ser Asn Val Phe Val Ala Ser Arg 

170 175 180 

TTG GAG AGT GTG GTT TAT GCA TCG TGG AGC CGG GTT CAG GCT GAC CTC 810 

Leu Glu Ser Val Val Tyr Ala Ser Trp Ser Arg Val Gin Ala Asp Leu 
185 190 195 

20 AAC TGC ATG AAG GAT CTC TAT GCA ATG AGT GCA AAC TGG AAG TAC TTG 858 

Asn Cys Met Lys Asp Leu Tyr Ala Met Ser Ala Asn Trp Lys Tyr Leu 
200 205 210 

ATA AAT CTT TGT GGT ATG GAT TTT CCC ATT AAA ACC AAC CTA GAA ATT 906 

He Asn Leu Cys Gly Met Asp Phe Pro He Lys Thr Asn Leu Glu He 

25 215 220 225 

GTC AGG AAG CTC AAG TTG TTA ATG GGA GAA AAC AAC CTG GAA ACG GAG 954 

Val Arg Lys Leu Lys Leu Leu Met Gly Glu Asn Asn Leu Glu Thr Glu 
230 235 240 245 

AGG ATG CCA TCC CAT AAA GAA GAA AGG TGG AAG AAG CGG TAT GAG GTC 1002 

Are Met Pro Ser His Lys Glu Glu Arg Trp Lys Lys Arg Tyr Glu Val 
6 250 255 260 

GTT AAT GGA AAG CTG ACA AAC ACA GGG ACT GTC AAA ATG CTT CCT CCA 1050 

Val Asn Gly Lys Leu Thr Asn Thr Gly Thr Val Lys Met Leu Pro Pro 
265 270 275 

CTC GAA ACA CCT CTC TTT TCT GGC AGT GCC TAC TTC GTG GTC AGT AGG 1098 

Leu Glu Thr Pro Leu Phe Ser Gly Ser Ala Tyr Phe Val Val Ser Arg 
280 285 290 

GAG TAT GTG GGG TAT GTA CTA CAG AAT GAA AAA ATC CAA AAG TTG ATG 1146 

Glu Tyr Val Gly Tyr Val Leu Gin Asn Glu Lys He Gin Lys Leu Met 

295 300 305 

GAG TGG GCA CAA GAC ACA TAC AGC CCT GAT GAG TAT CTC TGG GCC ACC 1194 

Glu Trp Ala Gin Asp Thr Tyr Ser Pro Asp Glu Tyr Leu Trp Ala Thr 
310 315 320 325 

45 ATC CAA AGG ATT CCT GAA GTC CCG GGC TCA CTC CCT GCC AGC CAT AAG 1242 

He Gin Arg He Pro Glu Val Pro Gly Ser Leu Pro Ala Ser His Lys 
330 335 340 

TAT GAT CTA TCT GAC ATG CAA GCA GTT GCC AGG TTT GTC AAG TGG CAG 1290 

Tyr Asp Leu Ser Asp Met Gin Ala Val Ala Arg Phe Val Lys Trp Gin 
50 345 350 355 
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TAC TTT GAG GGT GAT GTT TCC AAG GGT GCT CCC TAC CCG CCC TGC GAT 1338 

Tvr Phe Glu Gly Asp Val Ser Lys Gly Ala Pro Tyr Pro Pro Cys Asp 

7 360 365 370 

5 GGA GTC CAT GTG CGC TCA GTG TGC ATT TTC GGA GCT GGT GAC TTG AAC 1386 

Gly Val His Val Arg Ser Val Cys lie Phe Gly Ala Gly Asp Leu Asn 
375 380 385 

TGG ATG CTG CGC AAA CAC CAC TTG TTT GCC AAT AAG TTT GAC GTG GAT 1434 

Trn Met Leu Arg Lys His His Leu Phe Ala Asn Lys Phe Asp Val Asp 
70 390 395 400 405 

GTT GAC CTC TTT GCC ATC CAG TGT TTG GAT GAG CAT TTG AGA CAC AAA 1482 

Val Asp Leu Phe Ala lie Gin Cys Leu Asp Glu His Leu Arg His Lys 

410 415 420 



15 



20 



25 



30 



35 



40 



45 



GCT TTG GAG ACA TTA AAA CAC T GACCATTACG GGCAATTTTA TGAACAAGAA 1534 
Ala Leu Glu Thr Leu Lys His 
425 



GAAGGATACA 


CAAAACGTAC 


CTTATCTGTT 


TCCCCTTCCT 


TGTCAGCGTC 


GGGAAGATGG 


1594 


TATGAAGTCC 


TCTTTGGGGC 


AGGGACTCTA 


GTAGATCTTC 


TTGTCAGAGA 


AGCTGCATGG 


1654 


TTTCTGCAGA 


GCACAGTTAG 


CTAGAAAGGT 


GATAGCATTA 


AATGTTCATC 


TAGAGTTAAT 


1714 


AGTGGGAGGA 


GTAAAGGTAG 


CCTTGAGGCC 


AGAGCAGGTA 


GCAAGGCATT 


GTGGAAAGAG 


1774 


GGGACCAGGG 


TGGCTGGGGA 


AGAGGCCGAT 


GCATAAAGTC 


AGCCTGTTCC 


AAGTGCTCAG 


1834 


GGACTTAGCA 


AAATGAGAAG 


ATG TG AC CTG 


TGCCAAAACT 


ATTTTGAGAA 


TTTTAAATGT 


1894 


GACCATTTTT 


CTGGTATGAA 


TAAACTTACA 


GCAACAAATA 


ATCAAAGATA 


CAATTAATCT 


1954 


GAT AT TAT AT 


TTGTTGAAAT 


AGAAATTTGA 


TTGTACTATA 


AATGATTTTT 


GTAAATAATT 


2014 


TATATTCTGC 


TCTAATACTG 


TACTGTGTAG 


TGTGTCTCCG 


TATGTCATCT 


CAGGGAGCTT 


2074 


AAAATGGGCT 


TGATTTAACA 


TTGAAAAAAA 


A 






2105 



50 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 428 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Met Leu Arg Thr Leu Leu Arg Arg Arg Leu Phe Ser Tyr Pro Thr Lys 
15 10 15 

Tyr Tyr Phe Met Val Leu Val Leu Ser Leu He Thr Phe Ser Val Leu 
20 25 30 

Arc He His Gin Lys Pro Glu Phe Val Ser Val Arg His Leu Glu Leu 
35 40 45 

Ala Gly Glu Asn Pro Ser Ser Asp He Asn Cys Thr Lys Val Leu Gin 
50 55 60 
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Gly Asp Val Asn Glu lie Gin Lys Val Lys Leu Olu He Leu Thr Val 

65 70 
Lys Phe Lys Lys Arg Pro Arg Trp Thr Pro Asp Asp Tyr He Asn Met 

85 ^0 

Thr Ser Asp Cys Ser Ser Phe lie Lys Arg Arg Lys Tyr lie Val Glu 

100 105 

Pro Leu Ser Lys Glu Glu Ala Glu Phe Pro He Ala Tyr Ser He Val 

115 120 
Val His His Lys He Glu Met Leu Asp Arg Leu Leu Arg Ala He Tyr 
130 135 1 

Met Pro Gin Asn Phe Tyr Cys Val His Val Asp Thr Lys Ser Glu Asp 

150 1->j 



145 



Ser Tyr Leu Ala Ala Val Met Gly He Ala Ser Cys Phe Ser Asn Val 
165 170 

Phe Val Ala Ser Arg Leu Glu Ser Val Val Tyr Ala Ser Trp Ser Arg 
180 183 

Val Gin Ala Asp Leu Asn Cys Met Lys Asp Leu Tyr Ala Met Ser Ala 

195 200 

Asn Trp Lys Tyr Leu He Asn Leu Cys Gly Met Asp Phe Pro He Lys 

215 



210 



Thr Asn Leu Glu He Val Arg Lys Leu Lys Leu Leu Met Gly Glu Asn 



225 230 235 

Asn Leu Glu Thr Glu Arg Met Pro Ser His Lys Glu Glu Arg Trp Lys 
245 250 

Lys Arg Tyr Glu Val Val Asn Gly Lys Leu Thr Asn Thr Gly Thr Val 

260 265 

Lys Met Leu Pro Pro Leu Glu Thr Pro Leu Phe Ser Gly Ser Ala Tyr 
J 275 280 Z8:> 

Phe Val Val Ser Arg Glu Tyr Val Gly Tyr Val Leu Gin Asn Glu Lys 
290 295 300 



He Gin Lys Leu Met Glu Trp Ala Gin Asp Thr Tyr Ser Pro Asp Glu 
305 310 3 " 

Tyr Leu Trp Ala Thr He Gin Arg He Pro Glu Val Pro Gly Ser Leu 
J r 325 330 



Pro Ala Ser His Lys Tyr Asp Leu Ser Asp Met Gin Ala Val Ala Arg 
340 345 ■ 3:>u 

Phe Val Lys Trp Gin Tyr Phe Glu Gly Asp Val Ser Lys Gly Ala Pro 

355 360 3CO 

Tyr Pro Pro Cys Asp Gly Val His Val Arg Ser Val Cys He Phe Gly 
7 370 375 380 

Ala Gly Asp Leu Asn Trp Met Leu Arg Lys His His Leu Phe Ala Asn 
385 390 395 
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70 



Lys Phe Asp Val A>sp Val Asp Leu Phe Ala lie Gin Cys Leu Asp Glu 
405 410 415 

His Leu Arg His Lys Ala Leu Glu Thr Leu Lys His 
420 425 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TTTGAATTCC CCTGAATTTG TAAGTGTCAG ACAC 34 

(2) INFORMATION FOR SEQ ID NO : 6 : 

<i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 33 base pairs 
25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



75 



20 
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45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 
TTTGAATTCG CAGAAACCAT GCAGCTTCTC TGA 33 

(2) INFORMATION FOR SEQ ID N0:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1. .15 

(D) OTHER INFORMATION: /note- "PROTEIN A - C2GNT FUSION 
PROTEIN" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



GGG AAT TCC CCT GAA 15 
Gly Asn Ser Pro Glu 
1 5 



w (2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Gly Asn Ser Pro Glu 
1 5 

20 

Claims 

25 1. A purified human protein or an active fragment thereof having 01— 6 tf-acetylglucosaminyltransferase 
activity. 

2. The purified protein of claim 1, wherein said activity is that of UDP-GIcNAc:Ga1£1-3GalNAc (GlcNAc to 
Gal N Ac) 01—6 N-acetylglucosaminyltransferase. 

30 

3. The purified protein of claim 2, wherein said protein has a relative molecular weight of about 50 kD. 

4. An isolated nucleic acid encoding the human protein or active fragment thereof of claim 1 . 
as 5. A vector containing the nucleic acid of claim 4. 

6. The vector of claim 5, wherein said vector is a plasmid. 

7. The vector of claim 5, wherein said vector is pcDNAI-C2GnT. 

40 

8. A host cell containing the vector of claim 5. 

9. A purified human protein or a fragment thereof that is an acceptor molecule, said acceptor molecule 
being acted upon by the protein of claim 2 having activity which exclusively forms core 2 oligosac- 

45 charide structures in O-glycans. - - - ■ - ■ 

10- The acceptor molecule of claim 9, wherein said acceptor molecule is leukosialin, CD43. 

11. An isolated nucleic acid encoding the acceptor molecule of claim 9. 

50 

12. A vector containing the nucleic acid of claim 11. 

13. The vector of claim 12, wherein said vector is a plasmid. 
55 14. The vector of claim 12, wherein said vector is pcDSR«-!eu. 

15. A host cell containing the vector of claim 12. 
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16. A method of obtaining from a cell line, which does not normally contain a protein having catalytic 
" activity or an acceptor molecule for said protein, a nucleic acid encoding said protein having catalytic 

activity comprising: , 

a. transfecting said cell line with a DNA sequence encoding the acceptor molecule, wherein the 
5 acceptor molecule is stably expressed in the cell line; 

b. transfecting said cell line with a cDNA library containing said nucleic acid in a vector, wherein 
proteins encoded by the transfected cDNA are transiently expressed; 

c. screening the transfected cells for expression of said protein having catalytic activity; and 

d. isolating the nucleic acid encoding the protein having catalytic activity. 

10 

17. The vector of claim 16, wherein said vector replicates in the transfected cell line. 

18. The vector in claim 17, wherein said vector is a plasmid. 

75 19. The vector of claim 16, wherein said vector contains a viral replication origin. 

20. The vector of claim 19, wherein said replication origin is the polyoma virus replication origin. 

21. The cell line of claim 16, wherein said cell line supports replication of a vector. 

20 

22. The cell line of claim 16, wherein said cell line expresses polyoma virus large T antigen. 

23. The cell line of claim 16, wherein said cell line is the Chinese hamster ovary cell line. 
25 24. The cell line of claim 23, wherein said cell line is CHO-Py-leu. 

25. A method of isolating a polypeptide having catalytic activity that forms core 2 oligosaccharide 
structures in O-glycans, said method comprising growing the host cell of claim 8 under conditions 
which favor expression of a nucleic acid encoding said polypeptide, and isolating said polypeptide so 
30 produced. 
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